In this section, I am going to use the dataframe which is now clean and tidy through the wrangling process in the last section to some data analysis and present the results by visualization.
I am interested in some questions and they can be divided into two groups. The first group contains only one variable, and the other group contains more than one so we will know the correlation between some variables.
Group 2:
# The table shows the distribution of dog breeds in WeRateDogs
Golden Retriever is the most common dog we can see in WeRateDogs with nearly 160 times of appearance, which is far more than the number of Labrador Retriever of the second position (~110 times). Pembroke stands at the thrid place, actually very close to Chihuahua which comes following.
A slight digression. I was thinking about what is Pembroke? I search the name on Google and find that it actually means Pembroke Welsh Corgi! For me they should be the champion because they are the cutest dog in the world!
# This table shows the frequency of dog names in WeRateDogs
The result is quite average. Charlie and Lucy are the most common name among all (11 times) but just slightly more than the other names. Oliver and Cooper are at the second position with 10 appearances. This is a interesting question for me because I only know how people give name to their dogs in my language and it is good to know this in English.
# This table shows the frequency of stage of dog in WeRateDogs
Pupper is the stage of dog that can be seen very often on the page, much higher frequency than doggo at the second position. (Though I can't really distinguish the difference between them)
I try some new things here to present the answer. I want to find out the answer and directly show the picture of that tweet. Also there is tweet ID and count of like/retweet marked on the picture.
# This is the picture which received the most likes
# The is the picture got the most retweet
I didn't expect that two answers come out the same picture. The picture above got both the most retweet and likes among all the tweets in WeRateDogs. However, in my opinion, I don't think the picture anything special though.
# This table shows the distribution of rating given by the editor of WeRateDogs
Because of the special rating system in other to make tweets look funny, most of the rating are equal or higher than 10. 12 is the most commonly seen rating, then 10 and 11 come to the second. The average rating given by WeRateDogs is 10.67, higher than the denominator as expected.
# I create a word cloud to see the words they use frequently
Not surprising, this is a dog rating accout so words like "pupper", "dog" and "pup" are very common. And words like "meet", "love", hello", "happy" are always used, proving that the editor wants to make the page warm and cute. The funniest thing for me is that "af" is highly used in their tweets, that makes me laugh after seeing the word cloud.
# The table shows the average rating of different breeds of dog
# Get both mean and count because the more appearance the higher representativeness
Bouvier des Flandres has the highest average rating among all the breed but the fact is that it has only one appearance. If we are talking about the breed which is rated more than 5 times, Border Terrier got the highest rate with average 12.14 score. By the way, because I'm not so familiar with dogs especially in English name, I decide to make a simple function to show me the picture. The dog Iwould like to know how it looks like is Border Terrier.
# This is Border Terrier
# The table shows which breed got most average like from twitter users
Bedlington Terrier is the most popular dog among all the breeds on WeRateDogs and got 22642.5 average likes. French Bulldog can be seen quite frequently (over 30 times) and it received over 18000 average likes proving that people really love them.
Actually I have no idea how Bedlington Terrier looks like because there are so many dogs call "Terrier", so I want to see one of the picture of it.
# This is Bedlington Terrier...emmm...wait
Even I know Bedlington Terrier looks a bit similar to sheep, I'm pretty sure this is a lamb. How come the image predictor judged it as a Bedlington Terrier and gave it the highest confidence level among all possibilities? So in the next question I would like to look into confidence level.
# The table shows the average confidence level of different breed
Irish Wolfhound and Bouvier des Flandres recorded lower than 0.1 confidence level but they have only one appearance. Norwich Terrier got few appearances and recorded low confidence level as well with only 0.25. Bedlington Terrier in the last question is also on the list with 0.28 confidence level. This time I would like to see a sample picture of Norwich Terrier.
# This is Norwich Terrier...?
Is there any Norwich Terrier with such long legs? I'm not quite sure about the breed of this dog but I highly doubt that it is a Norwich Terrier. Now I can understand why the confidence level of predicting a Norwich Terrier is low.
Another question about convidence level. I was thinking that a picture with low confidence level would mean the picture is funny/humor, and it is more likey to receive higher numbers of like. So we are going to look at the correlation between confidence level and favorite count.
# The scatter chart shows correlation between confidence level and favorite count
# calculate correlation value
0.07 is a low positive correlation vaule. Implying that there is no significant relationship between two variables, which are confidnece level and favorite count. Lower confidence level wouldn't help to get more like. My hypothesis is wrong.
# The scatter chart shows correlation between rating and number of like
# calculate correlation value
Correlation value 0.38 indicate a moderate positive linear relationship between two variables. That means the higher the picture is rated, it tends to receive more like from the Twitter users.
To sum up in short, we got some very interesting insight from the data. We know the inclination of the editor in choosing breed of dog to write post and the wording of writing a post. On the other hand, we know whta kind of dog, or which style of picture would get more likes and retweets. Of course very importantly, it proved the some hypothesis in my head which turn out may be right or wrong. This is a funny project.